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ABSTRACT 

In  a  1974  paper,  the  author  indicated  how  natural  conjugate  priors  for  multi¬ 
dimensional  exponential  family  likelihoods  could  be  enriched  in  certain  cases 
through  linear  transformations  of  independent  marginal  priors.  In  particular, 
it  was  shown  how  the  usual  Normal-Wishart  prior  for  the  multinomial  distribu¬ 
tion  with  unknown  mean  vector  pnd  precision  matrix  could  have  the  number  of 
hyperparameters  increased;  the  "thinness1*” of  the  traditional  prior  is  well- 
known.  The  new,  linearly-dependent  prior  leads  to  full-dimensional  credi¬ 
bility  prediction  formulae  for  the  observational  mean  vector  and  covariance 
matrix,  as  contrasted  with  the  simpler,  self-dimensional  forecasts  obtained 
in  prior  literature.  However,  there  was  an  error  made  in  the  sufficient- 
statistics  term  of  the  covariance  predictor  which  is  corrected  in  this  work. 

In  addition,  this  paper  explains  in  detail  the  properties  of  the  enriched 
multinormal  prior  and  why  revised  statistics  are  needed,  and  interprets  the 
important  relationship  between  the  linear  transformation  matrix  and  the  ma¬ 
trix  of  credibility  time  constants.  An  enumeration  of  the  additional  number 
of  hyperparameters  needed  for  the  enriched  prior  shows  its  value  in  modelling 
multinormal  problems;  it  is  shown  that  the  estimation  of  these  hyperpar¬ 
ameters  can  be  carried  out  in  a  natural  way,  in  the  space  of  the  observable 
variables. 
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ENRICHED  MULTINORMAL  PRIORS  REVISITED 


by 

William  S.  Jewell 


1.  INTRODUCTION 

The  multivariate  Normal  distribution  continues  to  occupy  a  central  role 
in  Bayesian  analysis,  not  only  because  it  is  the  assumed  or  the  limiting 
distribution  in  many  practical  models,  but  also  because  it  provides  direct 
access  to  the  study  of  covariance  between  random  variables.  If  both  the 
mean  and  the  covariance  are  random  parameters,  then  the  usual  informative 
prior  that  is  assumed  is  the  Normal-Wishart ,  due  to  Ando  and  Kaufman  (1965). 
However,  it  is  well-known  that  this  prior  is  too  "thin",  that  is,  only  has 
a  small  number  of  hyperparameters  (see,  e.g.,  Press  (1981));  this  in  turn 
limits  the  modelling  of  prior  experience.  As  we  shall  see,  a  similar 
problem  occurs  in  finding  a  natural  conjugate  prior  for  any  distribution 
from  the  multivariate  exponential  family. 

In  (1974b),  the  author  suggested  a  method  for  "enriching"  the  multivariate 
prior  through  linear  transformations  on  independent  marginals,  thus  intro¬ 
ducing  more  hyperparameters.  This  approach  successfully  enriched  the  "cred¬ 
ibility  formula"  for  the  vector  mean  of  the  Bayesian  predictive  distribution, 
that  was  also  too  thin  with  the  Normal-Wishart  prior.  However,  because  the 
original  article  was  in  a  European  actuarial  journal,  it  received  little 
attention,  and  it  is  hoped  that  this  paper  will  encourage  further  work  on 
this  difficult  problem.  We  also  take  this  opportunity  to  correct  several 
typographic  errors  and  two  erroneous  formulae  in  the  original  work. 

We  begin  with  a  brief  review  of  the  credibility  theory  that  motivated 
this  research  and  the  problem  of  finding  natural  conjugate  priors  for  multi¬ 


variate  distributions. 
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2.  CREDIBILITY  THEORY 

Credibility  theory  is  the  name  given  by  American  actuaries  to  an  ap¬ 
proximate  formula  developed  in  the  1920's  to  forecast  the  mean  of  future 
observations  (of  insurance  claims).  In  modern  terminology,  we  would  say 
that  the  problem  is  one  of  finding  the  best  linear- least-squares  approxima¬ 
tion  to  the  mean  of  the  Bayesian  predictive  density.  By  now,  the  literature 
of  credibility  theory  has  grown  rapidly;  convenient  articles  showing  the 
variety  of  models  are:  Norberg  (1979),  Kahn  (ed.)  (1975),  and  De  Vylder 
(1982).  Of  course,  this  theory  has  many  results  in  common  with  traditional 
statistics  and  with  linear  filter  theory,  see,  e.g.,  Diaconis  and  Ylvisaker 
(1979)  and  Feinberg  (1980). 

To  illustrate  the  basic  model  development  that  motivated  the  work  on 
enriched  priors,  consider  the  usual  Bayesian  set-up  in  which  a  p-dimensional 
vector  of  random  variables  ,  jc  ,  defined  over  some  fixed  space  X  in  RP 
depends  upon  an  abstract  (vector  or  scalar)  parameter  0  in  a  space  0 
through  a  likelihood  density  p(jx  |  6)  ;  0  is  considered  also  to  be  random, 

with  a  known  prior-parameter  density  p(6)  ,  developed  through  previous  ex¬ 
perience  or  personal  belief.  In  actuarial  science,  as  in  economics,  there 
are  few  philosophical  barriers  to  such  assumptions.  If  n  independent 
samples  of  x  are  observed  with  fixed  0  ,  then  the  posterior- to-data  den¬ 
sity  of  0  is  given  by  Bayes'  law: 

n 

p(0  |  2?  )  «  n  p(x  |  0)p(0)  (1) 

x  t-1  c 


•k 

Notational  remarks  and  distributional  results  are  given  in  the  Appendix. 
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(we  omit  the  normalization),  where  is  the  data  set  (xt  |  t  ■  1,  . . . ,  n} 

In  insurance  application,  the  focus  of  interest  is  not  on  6  (which  repre¬ 
sents  some  abstract  property  of  the  insured  risk) ,  but  rather  on  the  forecast 
of  some  future  outcome  of  x  ,  given  that  6  remains  constant  (i.e.,  on  the 
distribution  of  future  claims  from  the  same  insured).  Thus,  attention  shifts 
from  (1)  to  the  predictive  density. 

p(x  I  Dx)  “  / p(~  I  6)p(e  I  Px)d6  - 


Ideally,  the  actuary  would  prefer  to  express  his  prior  experience  not 
through  p(e)  ,  but  through  the  prior-outcome  (marginal)  density: 

p(x)  ■=  J p(x  |  e)p(e)de  .  (3) 

However,  it  will  be  seen  that  is  not  possible  to  completely  avoid  considera¬ 
tion  of  the  structure  of  p(6)  . 

The  mean  of  (2)  is  the  "fair  future  premium"  that  is  of  central  interest 
in  insurance,  and  in  the  1920's,  American  actuaries  developed  a  one-dimen¬ 
sional  approximation  formula  through  practical  arguments,  which  we  may 
write: 


E(x  |  Vx) 


(1  - 


zoo)m 


zoox 


‘00 


00 


+  n 


(4) 


Here,  m  ■  E(x)  is  the  mean  prior  outcome  ("the  manual  fair  premium"),  and 
1  r 

x  •  —  i  x  is  the  usual  sample  mean  ("the  experience  fair  premium"), 
is  the  "credibility"  factor  which  mixes  the  predictor  from  prior  experience, 
m  ,  with  the  predictor  from  experimental  evidence,  x  ;  1  as  n  ®  . 


"1 
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n^g  ,  the  time  constant  of  learning  from  experience,  was  set  empirically. 

This  credibility  method  of  "experience  rating"  has  worked  well  in  the 
insurance  industry  since  that  time. 

With  the  renewal  of  interest  in  Bayesian  formulae  in  the  '50s,  Bailey 
and  Mayerson  showed  that  (4)  was,  in  fact,  exact  for  certain  simple  conjugate 
prior  p(x  j  9)  and  p(6)  .  Buhlmann  (1967)  then  proved  the  important 
result  that,  for  arbitrary  one-dimensional  priors  and  likelihoods,  (4)  was 
the  best  linear  least-squares  approximation,  provided  that  the  time  con¬ 
stant  n^g  is  chosen  to  be: 

nQ0  =  El/{x  j  6}/VE{x  j  6}  .  (5) 

These  moments  are  recognizable  as  the  components  of  the  total  variance 
U{x]  of  (3),  and  show  to  what  extent  the  structure  of  inter-risk  and  intra¬ 
risk  variability  needs  to  be  specified,  a  priori. 

In  (1974a),  the  author  showed  that  (4)  was  also  exact  for  the  linear 
exponential  family ,  whose  likelihood  can  be  written: 

(  \  ~®x 

p (x  j  6)  «  ~^(fj —  »  (x  e  X)  (6) 

provided  that  the  natural  conjugate  prior  density, 

r  -noo  ~9xo 

p(e)  -  ,C-^  d(eF -  ’  (e  e  0)  (7) 

is  used  over  the  complete  parameter  space  0  for  which  d(0)  is  finite, 

and  provided  that  p(6)  0  at  both  ends  of  the  range  (Jewell,  1975a)  .  It 

turns  out  that  n^g  is  just  the  time  constant  n^g  of  Buhlmann. 
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With  these  useful  results  relating  the  one-dimensional  credibility 
approximation  with  corresponding  Bayesian  formulae  for  which  (4)  is  exact, 
it  was  natural  to  consider  extensions  to  higher  dimensions.  In  an  unpub¬ 
lished  1973  report,  the  author  showed  that  the  p-dimensional  analogue  to 
the  Buhlmann  result  is: 


E(x  I  V  )  x  (I  -  Z)m  +  Zx  , 

1  X  IN#  IVIV  7 


(8) 


where  now  the  linear  approximation  minimizes  the  sum  of  squared-errors  in 
each  dimension,  x  is  the  vector  sample  mean: 


x 


(9) 


and  1^  is  the  p-dimensional  unit  matrix.  To  develop  the  remaining  first 
and  second  moments,  we  define  the  conditional  vector  mean  and  conditional 
matrix  covariance  using  p(x  |  8)  : 

m(8)  -  E{x  |  0}  ;  C(8)  -  £{x  |  8}  ,  (10) 
and  obtain  the  three  unconditional  first  and  second  moments: 


m  «*  Em(8)  ;  D  -  ,£{m(e)}  ; 

E  =  E{C(8) }  ;  C  -  D  +  E  -  l/{x}  . 

IV  ^  ^  V  IV# 


(11) 


E  and  D  are  sometimes  called  the  intra-risk  and  inter-risk  components  of 
variance,  respectively.  As  in  the  one-dimensional  case,  the  analyst  must 
estimate  not  only  m  and  £  in  X  ,  but  must  know  enough  about  the  param¬ 
eterization  to  estimate  the  two  components  of  covariance. 
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(8)  is  seen  to  again  be  a  mixture  of  the  mean  prior  outcome,  m  ,  and 
the  MLE  experience  estimator,  jc  .  However,  the  weights  of  the  mixture  now 
come  from  a  p  x  p  matrix  credibility  factor,  ,  so  that  the  experience  in 
all  dimensions  is  (usually)  useful  in  forecasting  any  particular  x^  .  In 
this  case,  we  say  we  have  full- dimensional  credibility;  such  forecasts  are 
intuitively  better  than  just  forecasting  x.  using  (x) .  . 

If  we  define  the  matrix  of  time  constants: 

N  =  ED-1  ,  (12) 

it  turns  out  that  the  matrix  credibility  factor  is  analogous  to  (4): 

Z  *  n(JN  +  nl)"1  ;  (i  ’  J  ®  "  n  ©  * 

Z_  ,  (_I  -  _Z)  ,  N  ,  and  their  powers  and  inverses  all  commute,  even  though  JN 
is  not  necessarily  symmetric.  Furthermore,  if  we  find  the  eigenvalues  {v^} 
of  N_  through  |jN  -  vl  |  =  0  ,  then  the  eigenvalues  of  Z_  are  (n/  (n  +  v^) } 
One  can  use  this  to  show  that,  in  the  non-degenerate  case,  Z  I_  as  n  -►  “> 

so  that  the  classical  estimator  3?  is  the  ultimate  credibility  forecast. 
However,  without  further  restrictions,  it  is  possible  for  the  components  of 
Z  ,  which  are  rational  functions  of  n  ,  to  show  non-monotone  behavior. 

With  this  "shrinkage"  result  established,  the  author  next  sought  to  find 
multi-dimensional  priors  and  likelihoods  for  which  (8)  is  exact.  But  here 
certain  difficulties  arose  that  will  be  explained  after  we  consider  multi¬ 
dimensional  exponential  families. 
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3.  NATURAL  CONJUGATE  PRIORS  IN  THE  MULTIVARIATE  EXPONENTIAL  FAMILY 

The  essential  reasons  why  (6)  (7)  provide  the  exact  Bayesian  result  (4) 
is  that  x  is  the  sufficient  statistic  for  6  ,  because  (6)  is  in  the  ex¬ 
ponential  family,  and  (7)  is  the  natural  conjugate  prior  to  (6) ,  which  is 
closed  wider  sampling,  that  is,  the  posterior  density  p(6  |  V  )  is  in  the 
same  family  as  (7),  with  the  updating  Hqq  +  n  ,  and  xq  Xq  +  nx  . 

We  now  attempt  to  generalize  this  approach. 

If  x  is  a  p-dimensional  random  variable  which  depends  upon  a  q-dimen- 
sional  vector  of  parameters  0  ,  the  general  multivariate  exponential  likeli¬ 
hood  can  be  written: 


p(£  I  8) 


t  .  -e'f(x) 
a(x) e  ~ 

c(0) 


(x  e  X) 


(13) 


where  a  is  the  kernel,  £  is  the  vector-valued  function  of  sufficient  sta¬ 
tistics,  and  c  is  the  normalization  factor,  chosen  so  that  ///  p<&  I  *  1  • 

The  parameter  space  0  consists  of  all  points  in  for  which  c  is  finite; 

0  is  known  to  be  convex  but  not  much  else  is  known  in  general. 

Natural  conjugate  priors  for  random  ^0  have  been  constructed  for  several 
specific  multivariate  distributions  in  the  exponential  family  (see,  e.g., 

Johnson  and  Kotz  (1972));  however,  there  seems  to  be  little  discussion  in  the 
literature  about  how  to  proceed  in  general.  Based  upon  one-dimensional  pro¬ 
cedures,  the  usual  approach  is  to  regard  (13)  as  a  function  of  £  ,  and  to 
replace  functions  of  :x  and  any  constants  by  hyperparameters ;  this  gives  a 
prior  density: 


-n 


tc(£>l 


00 


-e'f, 


oo 


d('n00’~00^ 


p(£)  "  p<£  I  noo  *  ioo* 


,  (e  c  e) 


(14) 
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with  q  +  1  hyperparameters  ^noo’~00^  c^at  may  have  to  be  restricted  so 
that  the  normalization,  d  ,  exists.  Usually  (14)  leads  to  a  prior  of 
recognizable  and  "interesting"  form,  even  though  no  advance  assurances  can 
be  given  in  the  general  case.  However,  if  p(8)  is  found  to  be  an  "honest" 
density,  then  we  see  immediately  that  it  is  closed  under  sampling;  i.e., 

given  n  independent  samples  (sct  ;  t  =  1 . n)  from  (13)  (with  fixed 

_6)  ,  we  find  p(£  |  P  )  to  be  of  form  (14) ,  with  updating; 


+  I  xtet>  • 

t*=l 


It  is  now  also  clear  why  _f  is  called  the  vector  of  sufficient  statistics. 

Most  applications  of  interest  concern  predictive  distributions  (Aitchinson 
and  Dunsmore  (1975)).  A  priori,  the  marginal  distribution  of  x  is: 


pOO  =  /p(~  I  &)p(&)d£  c 


+  1  tioo  +  X(£)} 
d^n00’~00^ 


which  is  also  usually  of  "interesting"  form,  if  (13)  and  (14)  are.  Further¬ 
more,  it  follows  that  the  predictive  density  p(x  |  P^)  is  also  closed  under 
sampling,  and  uses  the  updating  (15)  in  both  numerator  and  denominator  of 
(16). 

In  Jewell  (1974b),  it  was  shown  that  a  certain  generalized  credibility 
forecast  of  the  mean  value  of  the  function  f  then  follows: 


I  V  -  -  zoo)E{i(^)}  +  z00  n  ?  Xfet)  ’ 


with  a  scalar  credibility  factor  equal  to  (4).  This  result  requires  rather 
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strong  assumptions  about  the  regularity  of  p(£)  on  the  boundary  of  0  , 
which,  however,  usually  seem  to  be  satisfied  for  practical  distributions 

If  we  specialize  to  the  linear  exponential  family  with  f,(x)  =  x  (and 
q  =  p) ,  we  find  the  exact  multi-dimensional  mean  forecast: 

I  V  =  (1  “  z00>5  +  z00^  ’  (18) 

which  should  be  compared  with  the  credibility  approximation  (8).  This  result 
is  obtained,  for  example,  if  x  is  multinormal,  with  a  random  mean  vector 
U  and  fixed  precision  matrix  W  ,  with  a  multinormal  prior  on  y  : 


(see  Appendix  for  distributions).  However,  because  is  a  scalar  cred¬ 

ibility  factor,  we  see  that  (18)  is  rather  uninteresting  compared  to  (8), 
since  the  forecast  of  each  x^  is  given  by  a  credibility  forecast  using 
only  (x)^  •  Furthermore,  since  _N  *=  n^^I^  ,  each  forecast  has  the  same  time 
constant  nQ0  !  We  shall  call  such  forecasts  self-dimensional ,  in  contrast 
to  the  full- dimensional  form  (8) .  To  rectify  this  unsatisfactory  state  of 
affairs,  we  shall  have  to  consider  ways  to  "enrich"  (18),  by  adding  more 
hyperparameters  to  (14)  or  (19)  . 


\ 
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4.  ENRICHED  PRIORS  FROM  LINEARLY  DEPENDENT  MULTIVARIATE  EXPONENTIAL  FAMILIES 

In  Jewell  (1974b) ,  the  observation  was  made  that  the  thinness  of  (14) 
was  due  to  the  scalar  nature  of  the  factor  [ c (,©_)  ]  • 

There  have  been  various  proposals  for  enriching  multivariate  priors, 
particularly  the  multinormal  prior.  Those  known  to  the  author  are: 


(1)  Press  (1981)  refers  first  to  enrichment  as  the  process  of 
examining  the  behavior  of  p(x  |  £)  as  a  function  of  6_ 
which  we  used  to  find  (14),  above.  But  he  also  suggests 
that  various  lower-dimensional  marginals,  considered  as 
functions  of  ^6  ,  could  be  multiplied  together  to  get 
p(Q)  .  Such  a  procedure  was  followed  by  Kaufman  for  the 
multinormal  priors  in  unpublished  papers  in  1965  and 
1967;  however,  it  seems  to  lead  to  rather  complex  re¬ 
sults  with  little  motivation,  and  does  not  solve  the 
full-dimensional  credibility  problem; 

(2)  One  can,  of  course,  also  multiply  (14)  by  an  arbitrary 

-n, 

function,  say  [ g( 0 ) ]  1  .  But,  since  n,  will  not 

participate  in  the  updating  (15)  because  g(£)  has  no 
relation  to  c(j9)  ,  it  is  essentially  a  nuisance  hyper¬ 
parameter,  and  no  real  enrichment  has  taken  place,  ex¬ 
cept  perhaps  to  put  the  prior  in  some  standard  form. 

See  also  the  scale  changes  in  Section  10; 

(3)  Another  procedure,  often  recommended  when  one-dimen¬ 
sional  priors  do  not  match  empirical  priors,  is  to  use 
a  model-mixture  prior,  by  combining  a  (small)  number 
of  natural  conjugate  priors  (14)  with  different  co¬ 
efficients  : 


“I  \P(~  I  nk0’~k0)  *  (20) 

k 


Here  the  ja^  —  0  >  I  °k  *  M  an<^  C^e  var^-ous  hyper¬ 
parameters  are  determined  by  matching  the  empirical 
prior.  The  difficulty  with  this  approach  is  not 
with  the  updating,  which  follows  (15)  for  every 
"model"  in  (20) ,  but  is  that  the  weighting  coeffi¬ 
cients,  a^(Px)  >  mu8t  also  be  updated,  usually 
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in  a  messy  algebraic  manner.  And,  of  course,  when  ap¬ 
plied  to  the  credibility  problem,  (20)  will  merely 
stabilize,  for  large  n  ,  on  the  "best"  n^  for 

self-dimensional  forecasts; 

(A)  Another  suggestion  for  enrichment  is  to  regard  (1A) 
as  part  of  an  hierarchical  model  (Jewell  (1975b)), 
with  the  hyperparameters  (ngg.^fgg)  also  considered 

as  random  variables,  with  their  own  hyperprior  den¬ 
sity  p(ngg,^fg)  .  This  approach  requires  an  ex¬ 
tremely  simple  normalization  d  ,  and  probably  ex¬ 
tremely  few  random  hyperparameters  in  order  to 
carry  out  the  necessary  marginalization.  Dickey, 
Lindley,  Press  and  James,  (1981)  do  this  with 
a  model  in  which  there  are  only  2  random  and  1 
fixed  hyperparameters. 


Since  the  thinness  of  (1A)  is  due  to  the  scalar  nature  of  the  factor 
-n 

[c(6)]  00  ,  it  follows  that  the  number  of  time-constant  hyperparameters 

can  be  increased  if  c(0)  can  be  decomposed  into  several  factors  that 
depend  upon  various  subsets  of  ^6  .  One  way  the  decomposition  might  occur 
was  if  both  the  sufficient-statistics  vector,  f(x)  ,  and  the  kernel,  a(x) 
could  be  decomposed  into  related  components,  through  a  linear  transformation’, 
it  follows  then  that  0(8,)  would  undergo  a  similar  decomposition,  and  each 
factor  could  have  its  own  hyperparameter,  n^g  .  We  call  such  likelihoods 
where  this  decomposition  works  linearly  dependent  multivariate  exponential 
families.  (LDMEFs) . 

To  fix  ideas,  consider  first  the  case  where  the  sample  mean  is  the  only 
sufficient  statistic,  i.e.,  f(x)  ■  x  and  q  ■  p  .  Let  A  be  an  invertible 
p  x  p  matrix,  and  consider  first  a  vector  y  of  independent  random  risks, 
each  component  having  a  linear  exponential  likelihood  with  parameters 
(4>^,4>2»  •••»  <Pp)  >  so  that  the  joint  likelihood  density  is: 
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p(y  I  <f>) 

*«w 


P 

n 

i=l 


bi(yi) 

C±(h) 


(y  e  V) 


(21) 


for  some  appropriate  vector  space,  V  ,  kernels  {b^}  ,  and  normalizations 
{d^}  .  For  each  component  i  ,  an  appropriate  natural  conjugate  prior 
similar  to  (7)  is  constructed,  so  that  the  (independently  distributed) 
have  a  joint  prior: 


p(£) 


p  [ci(4>i)3  10  ~^'yQ 

i=l  di(niO,yiO) 


(£  e  4>) 


(22) 


for  some  appropriate  vector  space,  $  ,  and  the  2p  hyperparameters, 

lo  “  [y10’y20’  yp03  and  *0  ~  [n10’n20 . "pO1  ‘  Already  the 

situation  is  somewhat  improved,  for  it  can  be  seen  that  the  exact  form  for 

c{y  |  V  }  ,  although  "self-dimensional"  as  in  (18) ,  now  has  a  different 

~  y 

time  constant,  n^  ,  for  each  dimension.  In  other  words,  the  matrix 
credibility  forecast  (8)  is  exact,  but  with  a  diagonal  matrix  of  time 
constants: 


~0  "  dia&  fco3 


(23) 


and  corresponding  diagonal  credibility  matrices 


"QJq  +  "I)"1  • 


<4 
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Now,  make  the  (full  rank)  transformations: 

1  “  A_1x  ;  ♦  =  A'fl,  ;  (24) 

this  changes  (21)  into  a  likelihood  proportional  to: 

p(x  |  6,)  «  H  bi((A  1x)i)e"~~  ,  (25) 

i*l 

and  by  comparison  with  (13),  with  ,f(x)  =  x,  ,  we  see  that  this  is  precisely 
the  result  we  will  have  if  the  linear  transformation  x  =  factors  the 
kernel  b(jic)  into  p  components,  each  depending  upon  a  single  y^ 

(i  *=  1,2,  ....  p)  . 

If  the  likelihood  permits  this  factorization,  it  then  follows  that  the 
enriched  LDMEF  prior  on  £  is  gotten  from  (22)  as: 

p(6)  -  n  lc.((A'8)  )]  1  e  ^  ,  (26) 

i-1 

with  the  hyperparameter  transformation 

*0  ’^0  •  (27) 


The  following  result  is  then  obtained: 

Theorem : 

If  P(x  |  ,6)  is  an  LDMEF  likelihood,  as  defined  in  (21)  (25),  and  the 
enriched  prior  (26)  is  used  with  hyperparameters  {n^g  >  0}  »  the  full¬ 
dimensional  forecast  (8)  for  E{x  I  V  }  applies,  with  time  constant  matrix: 


N 


(28) 
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Proof: 

As  mentioned  above,  the  proof  requires  the  assumption  that  p(6)  is 
well-behaved  on  the  boundary  of  G  .  After  showing  that  ,Xg  « jjm  ,  the 
result  follows  after  showing  that  the  correct  updating  with  data  V is 
given  by: 

n 

N  N  +  nl  ;  xn  «-  xn  +  J  x.  .  (29) 

Further  details  may  be  found  in  Jewell  (1974b).  H 

To  show  that  this  result  is  not  vacuous,  we  reconsider  the  problem  of 
the  multinormal  prior  with  random  mean  u  and  fixed  precision  matrix  W  , 
whose  thin  prior  was  given  in  (19).  It  follows  from  the  above  that  the 
"thick"  prior  that  gives  a  full-dimensional  credibility  forecast  is  simply: 

V  ~  N  (m  ;  WN  ■=  N'W)  .  (30) 

#>•»  p  *******  *>+  *** 

Here  E  *  W  ^  is  fixed,  but  D  =  (WN)  ^  *  N  .  The  initial  precision 
matrix  associated  with  the  forecast  E{x  \  V  }  “  E{y  I  V  }  is  NW  =  D  ^  , 
so  that,  after  n  observations,  the  precision  improves  (homoscedastically) 
to  (N  +  nI)W  ■  D  ^  +  nE  ^  .  These  results  are  well-known  (see,  inter  alia, 
De  Groot  (1970),  oage  175). 

We  turn  now  to  the  fundamentally  more  difficult  problem  of  the  multi¬ 
normal  with  both  random  mean  and  random  precision. 


15 


5.  THE  ANDO- KAUFMAN  MULTINORMAL  PRIOR 

Consider  the  p-dimensional  multinormal  with  random  mean  vector  u  ,  and 
random  precision  matrix  £  ,  for  which  the  likelihood  density  is: 

p(x  j  p,fl)  =  ( 2tt )  ^Iftj^exp  j-  4  (x  -  u)'£(x  -  u)|  .  (31) 

This  is  in  the  family  (13),  with  f . (x)  *=  x.  and  6.  =  -(ftp).  for 

i  =  1,2,  ....  p  ,  and  f  i(x)  =  x^  and  6i  =  u^k  for  (j  ,k)  -  1,2,  ....  p  , 

2 

and  i  =  k  +  pj=p  +  l,p  +  2,  ...,  (p  +  p)=q.  Thus,  we  have  many  more 
random  parameters  than  observables.  The  kernel  a(x)  =  1  ,  and  the  normalizing 
factor  (in  traditional  notation)  is: 

c(p,fi)  «  (2tt)P^2|q|  ^  exp  u’ftu!  •  (32) 

2 

To  form  the  natural  conjugate  prior,  we  first  assume  q  +  1  =  p  +p+l 
hyperparameters  {n^;^;^}  ’  wliere  .So  is  a  P  x  P  matrix,  and  then  follow 
(13) ,  with  (32)  expressed  in  terms  of  £  .  Upon  transforming  back  to  tradi¬ 
tional  notation  (the  Jacobian  of  the  transformation  is  |fij ) ,  we  obtain  the 
thin  prior: 

p(p;£)  -  |£|  exp  j-  \  u’  (n00&>£  +  ^  tr  (fi^)  |  .  (33) 

By  factorization  into  p(p  !  £)  ’  p(£)  >  this  can  be  seen  to  be  a  Normal- 
Wishart  density.  For  the  conditional  mean: 

p(u  |  £)  «  (nf55  exp  |-  \  (p,-  njjjxj  (n00£)(u  -  n'jxjj  ,  (34) 

that  is,  a  multinormal,  with  (p  |  £)  ~  ^("ofeo^ooa)  • 
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This  leaves,  after  some  algebra,  the  precision  matrix  density. 


p(£>  -  l£ 


*i(n00+1) 


exp  j-  \  tr  (2U0)j 


~o  =  *3o  “  noo^o^o  ’ 


defined  only  in  those  points  of  f>"sP(P+l)  for  which  is  positive  definite. 
(35)  is  the  Wishart  density,  jO  ~  +  P  +  2  ;  ^Uq)  ,  for  which  moments  are 

given  in  the  Appendix.  We  now  note  that  ^  (and  hence  Qq)  must  be  sym¬ 
metric,  not  for  the  trace,  but  for  the  moment  formulae  (39)  below.  Thus,  there 
1  2 

are  really  only  -  (p  +  3p  +  2)  free  hyperparameters. 

The  marginal  distribution  of  y  requires  some  additional  algebra  (see 
De  Groot  (1970)  or  Press  (1981)),  which  finally  gives: 


p(p)  «  1  + 


noo  ~  noo2o )  ~o  (~  "  noo£o) 


-4(n00+P+3) 


which  is  seen  to  be  a  multivariate  Student-t  density,  y  ~  S  (n  +  3  ; 

***  p  \  uu 


n00^0  ’  n00^n00  +  3>V)  ’  ASain»  moments  are  found  in  the  Appendix. 

Similarly,  the  marginal  outcome  density  is  also  found  to  be  Student-t, 
with  x  ~  Sp(nQ0  +  3  ;  n”^  ;  nQ0(n00  +  3)  (nQ0  +  l)-1^1)  •  The  updating 
is,  from  (31): 


“00  00 


n00  +  n  *  & 


-■o  *  £o 


+  l  & t  •*  .So  ""  -So  +  l 
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The  relationship  with  the  moments  defined  in  (11)  follows  from  the  moments 
of  (35)  (37): 

~  =  n00^0  ’  ~  =  ^00  +  ^  ^0  ;  ~  =  n00^n00  +  ^  ^0  ’  ~  =  nOO^O  ’  (39) 

so  that,  as  expected,  the  forecast  mean  E(x  |  P  )  is  of  the  self-dimen¬ 
sional  form  (18).  A  new  credibility  formula  of  interest  is  the  forecast  of 
the  covariance  matrix  of  (37),  as  updated: 


V[~x  1  V  } 
~  ~  1  x 


(1  '  +  ZOo[n  (£t  '  ^)(£t  "  2>' 

+  2oo(1  ■  z00)(£  ~  £)(£  ~  s>'  • 


(40) 


Again  we  see  the  familiar  convex  combination  of  the  prior  outcome  covariance 
and  the  classical  MLE  covariance  estimator,  supplemented  here  by  an  inter¬ 
mediate  term  which  uses  the  variation  of  the  sample  means  about  their  true 
values  (Zqq(1  -  Zqq)  attains  its  maximum  value  at  n  *  n^g) .  Results 
analogous  to  (40)  are  given  by  both  De  Groot  (1970)  and  Press  (1981),  although 
not  in  as  appealing  a  form. 

The  prior  (33)  was  discovered  by  Ando  and  Kaufman  (1965) ,  and  its  "thin¬ 
ness"  is  well-known.  The  usual  criticism  is  that  one  cannot  set  both  the 
means  and  covariances  of  and  independently,  or  to  put  it  differently, 
once  E{ u }  and  E{£}  or  E{0  are  given,  there  is  only  one  free  parameter. 
From  our  point  of  view,  the  limitation  is  that  the  two  components  of  observa¬ 
tional  covariance  cannot  be  specified  independently,  since  £  -  n0o2-  ’  ancl 
this  makes  the  credibility  forecasts  of  mean  and  covariance  (18)  (40)  both 
self-dimensional . 
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Actually,  the  prior  given  by  Ando  and  Kaufman  is  slightly  more  general 
than  (33),  with  |ftj^n00+^  replaced  by  |j2|^a  ^  ,  where  a  is  the 
"degrees  of  freedom",  not  necessarily  equal  to  p  +  Uqq  +  2  .  This  leads 
to  invariant  nuisance  hyperparameter  enrichment,  of  the  type  already  dis¬ 
cussed,  that  merely  scales  the  observational  covariances,  independent  of 
the  means.  As  mentioned  above,  in  some  unpublished  work  in  1965  and  1967, 
Kaufman  additionally  enriched  this  prior  by  multiplying  the  Wishart  density 
by  arbitrary  powers  of  the  products  of  determinants  of  principal  minors  of 
ft  ,  thus  introducing  p  -  1  additional  hyperparameters.  But,  the  resulting 
formulae  are  quite  complicated,  and  do  not  appear  to  give  credibility  results. 
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6.  LINEARLY-DEPENDENT  MULTINORMAL  PRIOR 

We  now  use  the  methods  of  Section  4  to  provide  a  LDMEF  prior  to  (31), 
keeping  in  mind  that  we  want  to  obtain  a  full-dimensional  forecast  (8)  and, 
hopefully,  to  generalize  (40).  If  we  apply  the  transformations: 

x  =  Ay  ;  p  =  AX  ;  (41) 

we  see  that  we  obtain  transformed  variables  (y;X)  of  full  rank,  since  A 
is  assumed  to  be  p  x  p  and  invertible.  But  this  then  leads  in  (31)  to  a 
transformation  of  the  precision  matrix: 

fi  -  A'fiA  ;  Cl  =  (A_1)  'fiA-1  ;  (42) 

and,  since  we  require  that  the  {yj  be  statistically  independent,  the  trans¬ 
formed  precision  matrix  £  must  be  diagonal  with  probability  one\  In  other 

2 

words,  to  factor  the  last  p  components  of  (f  (x)}  ,  we  must  impose  con- 

i.  ^ 

straints  on  the  associated  parameters.  This  also  factors  the  term  |ftj 

This  then  permits  one  to  introduce  a  random  mean,  X^  ,  and  a  random 
precision,  ,  for  each  ,  and  to  set  (y^^  |  ~  N^d^.Tr^)  ,  so 

that  the  equivalent  of  (21)  is: 

P(y  |  X,£)  ec  In^  exp  j-  j  (y-  X)  '£ (y  -  £)|  ,  (43) 

with  X'  “  U1,  ....  Xp]  ;  if  -  TTp]  ;  and  £  *  diag  (it)  . 

Through  inverse  transformations,  (43)  then  reverts  to  (31),  although 
we  must  be  careful  in  the  sequel  with  terms  involving  £  ,  since  it  is  no 
longer  of  rank  ^  p(p  +  1)  .  If  preferred,  one  can  think  of  a  full-rank  £ 
being  constrained  by  y  p(p  -  1)  equations  taken  from  (42): 
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I  \  akialj“kl  =  0  ^-P-1  >  1  * 

k  1 


This  loss  of  rank  will  be  somewhat  compensated  for  by  the  introduction  of 
more  hyperparameters. 

The  independent  natural-conjugate  prior  for  the  random  parameters 
(X^,tk)  turns  out  to  be  a  one-dimensional  version  of  (33): 


^ni0+2>  (12  II 

p(  W  -  exp  (-  j  Xi(ni0ni)  +  X.r.y^  -  j  r^.J 

‘  "i  eXp  f  2  (Xi  ‘  ni0yi0)  (ni0’,i)i 


?5(ni0+1) 

IT  . 

1 


(  1  I 

exp|  IWi  • 


which  can  be  called  a  Normal-Gamma,  since  (X  |  tt^)  ~  ^i(nioyiO ^iO^i )  and 

(V  -  G(i  <ni0  +  3)  '1  vio)  >  where  vi0  "  ri0  '  nioyiO  •  ^  marginal 

density  of  X  is  a  one-dimensional  Student-t,  similar  to  (37) : 


2" 

P<V  ‘  1  +  niovIo(Ai  -  nioyio) 


-^[n^+4] 


For  completeness,  we  record  the  first  and  second  moments  of  (X^  |  ir  )  : 


E{Xi  |  -  n“Jyi0  ;  V{X±  |  i^}  -  (n^)"1 


and  of  ir^  and  ; 
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E{V  =  (ni0  +  3)viJ 

F\-V 

;  Etn.  , 

=  (n10  +  l)_1vi0  ; 

V{V  -  2^ni0  +  3)vio 

,.l — ii 
;  vri  , 

2  (ni0  +  ^  <ni0  ”  ^  vi0  ’ 

(48) 

so  that,  unconditioning: 

E{V  =  niOyiO 

;  i/(x±) 

=  nTj(ni0  +  l)_1vi0  ; 

(49) 

and 

E{yi}  =  WiO 

;  V{~y±) 

niOViO  ‘ 

(50) 

These  are  analogous  to  the  Ando-Kaufman  results  (39),  and  show  that,  to 
obtain  meaningful  densities,  we  must  pick  >  0  ,  >  0  ,  and  n^^  >  0 

(niQ  >1  if  we  want  UUCy^  |  }  to  exist).  Further,  since  * 

-12  2 
ri0  ~  niOyiO  ’  thi-s  imPoses  a  restriction  of  r^oniO  >  yi0  in  t*ie  3oint 

prior  (45).  In  fact,  since  there  are  only  three  hyperparameters  (nio,yiO,riO^  * 

we  see  that  there  is  already  some  "thinness”  in  specifying  prior  moments 

in  one  dimension. 

Updating  with  independent  data  V  m  {y„  ;  t  ■  1,2 . n}  is  simply: 

y 

ni0  *•  ni0  +  n  ; 

n 

yi0  ~  yi0  +  yit  ;  (51) 

n  2 

ri0  *  rio  +  }ml  ^  * 

Our  next  step  is  to  express  the  joint  density  of  all  ( A ^ , 7i ^)  in  matrix 
notation  by  using  the  previous  definitions  for  n^  ,  Nq  ,  and  from 

Section  4,  and  setting 
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JSO  =  [r 


10,r10,rp0^  ’  ~o  =  diag  ~0^  ' 


(52) 


We  then  obtain: 


P  4(ni0+2) 

p(X,n)  *  II  (it  )  exp 

i-1 


(53) 


We  can  write  the  last  term  as  7T'r„  ■*  tr  (IIR  )  if  we  remember  that,  with  JI 

~  ~0  ~~p  ~ 

diagonal,  the  trace  "annihilates"  any  off-diagonal  elements,  so  that 
tr  (HR^)  =  for  any  R  with  diagonal  equal  to  _Tq  !  This  point  was 

responsible  for  several  errors  in  the  author's  original  paper.  We  now  make 
the  transformations  X  =  A  ,  n  =  A'ftA  ,  and  define: 


N_  =  A_1NA  =  A’N' (A')”1  ;  R_  =  A-1Q(A_1) '  ; 

~o  ~  ~~  ~  ~  ~  ~0  ~  ~0  ~ 


-1 


N  =  ANnA  ;  Q  =  AR_A  ; 
—  ~~o~  -oo  ‘-~0~ 


(54) 


and  thus  obtain  the  new,  enriched  LDMEF  multinormal  prior: 


P 

p(v,ft)  «  n 

i=i 


1-i(niO+2) 

t  < i  ]  exp 


-  -r  p'  (ftN)p  +  p'nxr 


(55) 


where  1  is  the  p  *  1  vector  with  all  1's  (II  *  1  =  tt  ;  R  •  1  =  r-.)  . 
is  symmetric  by  construction,  but  need  not  be. 

By  comparison  with  the  thin  prior  (33),  we  see  that  (nQ^jy  in  the  ex¬ 
ponent  is  replaced  by  (fiN)  ,  a  generalization  similar  to  the  constant 
covariance  case  (30).  On  the  other  hand,  it  does  not  seem  possible  to 
simplify  the  leading  product  terms  in  (55)  in  terms  of  N,  ,  unless,  of 
course,  all  the  time  constants  are  the  same,  and  Nrt  *  nftrtI  *  N  transforms 

’  ’  ~o  00~  ~ 

1  1 2 1  ,1s(n00+2) 

the  leading  product  into  ]A| 


The  messy  last  term  in  the  exponent  can  be  rewritten  tr  [A  ^SJQqAIIJ  ]  , 
or  simply  tr  (J2Qq)  if  we  remember  the  previous  caution  about  the  trace, 
and  realize  we  would  get  the  same  answer  now  for  tr  >  where  =  AR^A' 

if  R  were  any  square  matrix  with  diagonal  equal  to  . 

In  fact,  because  the  same  matrix  A  has  been  used  to  transform  the 


original  3p  hyperparameters  in  (yQ,NQ,Rg)  ,  it  follows  that  there  are 

2  12  3  2 

strong  relations  among  the  p  +  p  +  (p  +  p)  =  —  (p  +  p)  transformed 
hyperparameters  (x^N^O^)  ,  and  the  linearly-dependent  random  parameters 
(u,£)  ,  such  as: 


QoN'  ; 


(not  expected  as  N  assyinmetric) ,  and: 

JN=N’£  ;  £Qq  =  (AA')-1  (^0g)  (AA')  ;  (w.p.l)  (57) 

1  2 

which  are  alternate  versions  of  the  (p  -  p)  constraints  (44).  In  short, 
we  must  not  assume  that  the  n  ,  N  ,  and  ^Qq  of  (55)  have  all  the  same 
properties  as  fi  ,  ,  and  of  (33)  or  ,N  of  (12).  We  will  return 


to  this  point  later. 
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7.  LDMEF  MULTINORHAL  PRIOR  MARGINAL  AND  MOMENTS 

As  tempting  as  it  is  to  factor  (55)  directly,  as  in  Section  5,  we  are 
on  safer  ground  if  we  factor  (53)  directly,  and  then  transform;  by  checking 
also  the  transformation  of  moments,  this  will  enable  us  to  deduce  the  correct 
form  for  the  new  degenerate  covariance  term. 

For  the  conditional  mean,  it  is  easily  seen  that  (_A  |  n)  ~  ^piSo^O vSSo 
which  transforms  into  (w  |  £)  ~  W  (n  ^Xq;£nJ  .  This  is  the  natural  general¬ 
ization  of  (34),  and  is  the  same  as  in  the  fixed  covariance  case  (30).  The 
moments  corresponding  to  (47)  are  easily  found  to  be: 

*■{£  I  &  m  a~%  i  I  £>  *  ^N)-1  .  (58) 

It  is  the  first  term  that  gives  a  full-dimensional  credibility  forecast; 
note  also  that  the  constraint  (57)  on  £  guarantees  also  symmetry  of  the 
covariance  term. 

The  difficulty  comes  from  the  marginal  precisions,  which  have  joint 
density: 


pQO 


p 

n 

i*i 


(tt±) 


h(niQ+l) 


exp  r  Ti^oi 


(59) 


where  Vq  was  previously  defined  to  have  components  r^  -  n^y^Q  .  We 

know  that  we  want  to  express  at  least  the  exponent  in  terms  of  the  matrix  £ 

before  transformation;  r„  has  a  diagonal  version,  but  it  is  not  clear  how 

~u 

2 

to  diagonalize  y^^  ,  as  we  want  the  updating  of  ^Xq  to  remain  simple.  Two 
possibilities  are  to  define  a  matrix  version  of  v«  as: 


~i  =  -So  "  ~o 


or 


(60) 
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(V_  was  used  in  the  author's  1974  paper).  Both  of  these  are  easily  trans¬ 
formed,  and  both  give  the  correct  result  under  annihilation  by  the  trace 
operator,  tr  ^IV^)  =  tr  (HV^)  =  ~'-^0  • 

However,  when  we  examine  the  matrix  generalization  of  the  moments  (48), 
it  becomes  obvious  that  only  a  diagonal  version  of  _Vq  is  acceptable;  the 
only  way  to  achieve  this  is  to  define  a  new  matrix  version  of  y^  : 


~0  =  dia«  = 


"10 

*20 

o  y 


5  Zo  =  ’  i  ; 


p0  , 


(61) 


and 


to  take  as  definition  of  _Vq  any  of  the  equivalent  forms: 


V0  ■  50  -  N ■  So  -  •SoiS1  • 


(62) 


then  replacing  jf'vg  tr  • 

After  transformation,  we  have: 


P  >s(n10+1) 

p(fi)  oc  n  [  (A ’  flA) .  ]  exp 

i-1  ~ 


(_  1 


2  tr  W 


(63) 


which  might  be  called  a  linearly-dependent  Wishart  density.  The  transformed 
precision-parameter  _Uq  is  given  by: 


UA  =  Q.  -  n'Vx!  =  Qn  -  X.n71X'  =  Qa  -  X^X' (N_1) '  , 
~0  ~  ~Cr-0  ~0  ^0~0  ~0  ^0  's*0~0  ^ 


(64) 


(the  in  the  middle  form  is  not  a  typographical  error),  where  we  define 


a  new  matrix  of  hyperparameters : 
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X0  =  AY0  -  A  diag  (A-1^)  ;  Xq  •  1  -  xQ  ; 


(65) 


which,  although  neither  diagonal  nor  symmetric,  has  sums  over  columns  equal 

to  x„  . 

~0 

Now,  we  can  safely  rewrite  the  means  in  (48),  and  obtain,  finally: 


£{£}  =  (N1  +  3I)U^  ;  EUf1}  -  E  -  (N  + 


(66) 


Explicit  forms  for  the  covariances  of  the  covariance  can  be  found  from  (48), 
but  give  formulae  involving  A  . 

It  is  also  difficult  to  use  (46)  to  produce  a  simple  formula  for  p(u)  » 
similar  to  (37),  which  might  be  termed  a  linearly-dependent  Student-t; 
generalizations  of  this  and  more  complicated  type  often  arise  in  multivariate 
analysis.  Similar  remarks  apply  to  the  exact  form  for  our  outcome  density, 
p(x)  . 

However,  moments  corresponding  to  (49)  (50)  follow  directly  by  trans¬ 
formation: 


E{y} 


N-1x„ 
~  ~0 


JttuJ 


D  «  N_1(N  +  l)-1Un  ; 

*"«■*  'w  «v 


(67) 


and 


E{x}  -  N_1x_  ;  V{x)  -  N_1U_  .  (68) 

'■wQ  <v  I>v 

These  are  exactly  equivalent  to  the  Ando-Kaufman  results,  with  the  exception 
that  ,N  is  there  replaced  by  HqqJ,  »  And  given  by  (36),  instead  of  (64). 

The  variance-covariance  matrices  are  symmetric,  since  (cf.  (56)),  _NUq  *  ,UqN  '  , 
and  similarly  for  other  powers  of  N  or  N_  ^  . 
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8.  LDMEF  CREDIBILITY  FORECASTS 

We  are  now  ready  to  analyze  the  updating  to  the  LDMEF  prior,  and  then  to 

obtain  the  forecasts  from  (68).  Starting  from  P  «  {y  ;  t  =  1,2,  n}  , 

y 

we  see  that  we  again  get  quadratic  difficulties  (similar  to  the  definition  of 

n  2 

Vq)  ,  since  r^Q  r^Q  +  £  y^t  .  This  leads  us  to  define  expanded  versions 

of  yt  and  x  ,  corresponding  to  (61)  (65): 


Yt  -  diag  (yt)  ;  ^  =  YJ.  ; 

=  £it  m  A  diag  ( A-1xt)  ;  xt  *  Xt  •  1  (t  -  1,2,  ....  n)  . 


(69) 


It  follows  easily  that,  posterior-to-data  P ,  the  LDMEF  priors  (55)  (59) 
and  the  various  moments  merely  require  the  updating: 


N  «-  N  +  nl 


n  n 

x„  ■*-  xn  +  J  x  ;  X.  X  +  y  X  ; 

~0  ~0  L  ~t  ~0  ~0  L  ~t 

t*l  t=l 


(70) 


So  'So  +  l  Z&  • 

t“l 


which  shall  be  compared  with  (38)  and  (29).  While  we  have  succeeded  in 
introducing  the  enriched  time  constants,  we  see  that  we  require  a  new, 
expanded  form  for  updating  ,  and  hence  ■So  ‘ 

The  credible  mean  is  unaffected  by  this.  Setting  m  ■  N,  ^x.  to 
coincide  with  (8),  we  find  that  the  first  formula  in  (68)  gives  the  pre¬ 
dictor  : 


E{x  |  P  )  ■  (I  -  Z)m  +  Zx  , 

1  X  'v  ^  I'V'V  ' 


(71) 


as  expected. 
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But  to  update  the  covariance  matrix,  we  find  that  we  also  need  ex¬ 
panded  versions  of  m  and  £  ,  as  follows: 

M  =  N_1X_  =  A  diag  (A_1m)  ;  Ml  =  m  ; 

mm  mm 

(72) 

1  n  1 
X  =  —  l  X.  =  A  diag  (A  x)  ;  XI  *=  x  . 

~  n  *-  "wf  'V  ~  MMMM  MV* 

t=l 

After  some  algebra,  and  liberal  use  of  symmetry,  we  obtain: 

l/{x  I  V  }  =  (I  -  Z)C  +  ZT(P  )  +  Z(I  -  Z)(X  -  M)(X  -  M)'  ,  (73) 

MV.  MV*  1  X  MW  *N*/MV  MMMS*'  MV.  MV.  MV*'  Mm  MV*  MV  <V 


with 


1  n 

T(V  )  =  A  T  (X„  -  X)(Xfc  -  X)’  ,  (74) 

~  x  n  ).^1  ~-t  ~  ~t  ~ 

which  should  be  compared  to  the  result  (40)  with  the  Ando-Kaufman  prior.  We 

see  that,  as  a  price  for  working  with  a  full  credibility  matrix  £  ,  we  must 

use  the  expanded  forms  for  the  inter-risk  and  intra-risk  statistics.  (73) 
was  given  incorrectly  in  Jewell  (1974b)  because  of  the  dimensionality  problems 
discussed  earlier. 

This  explains  why  T(#x)  is  the  correct  sufficient  statistic,  as 
n  -*■  “  ,  for  the  dispersion  or  precision  of  x  ,  instead  of  the  more  usual 
sufficient  statistic,  £(Px)  *  n  ^  I  (*t  “  *,)  ~  &)  ’  *  The  latter  is 

sufficient  for  £  only  if  £  is  of  full  rank,  while  in  our  case,  it  was 
constructed  from  a  lower-dimensional  vector,  it  . 

We  give  two  additional  formulae  not  in  the  original  paper: 
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9.  RANDOM  PRECISION  ONLY 

Our  new  results  can  be  most  easily  seen  in  isolation,  by  considering 
the  case  where  only  £  is  random,  and  we  set  m  5  0  for  convenience.  The 
problem  of  predicting  £  ^  is  of  continuing  interest  in  the  Bayesian  liter¬ 
ature,  see,  e.g.,  Dickey,  Lindley,  and  Press  (1981). 

For  the  thin  natural-conjugate  prior,  we  obtain  easily: 

-5nnn  (  l  ) 

p(£)  -  |£|  exp  -  \  tr  (£UQ)j  ,  (78) 


that  is,  ~~^p(nQQ  +  P  +  1  ;  Ug)  with  (p  +  p  +  2)  hyperparameters 
(ngg.Ug)  to  be  estimated  (Uq  must  be  symmetric,  not  for  the  trace,  but  for 
the  moments  below) .  This  may  be  compared  with  (35) .  Updating  is  as  in  the 
first  and  third  formulae  of  (38) ,  so  that  the  thin  forecast  of  the  covariance 
of  the  observations,  corresponding  to  (40),  is  simply: 


l/{x  |  V  }  =  Eifl"1  I  V 


(1  "  zoo)~  +  zoo 


1  n 

—  £  x  x* 
n  L ,  ~t~t 


t-1 


(79) 


where  C  -  ngj^Q  •  This  result  is  well-known,  although  not  usually  ex¬ 
pressed  in  shrinkage  form  (see,  e.g..  Press  (1981)).  (Remember  £  ■  E  , 
and  D  ,  m  vanish) . 

For  the  enriched  prior,  we  follow  the  appropriate  steps  of  Section  6 
and  find  that: 


P 

p(0)  II  [(A'QA)  .  ]  exp 
i-1  i 


L  ±  tr 
I  2  C 


<85o> 


(80) 


for  some  £  constrained  by  (44)  or  (56)  or  (57),  that  is,  of  rank  p  ,  and 
with  _Uq  “  £g  -  ^ARgA'  ’  where  diagonal,  so  that  (56)  applies.  We 
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thus  have  the  2p  initial  hyperparameters  for  tt_  ,  plus  the 

effective  terms  in  the  transformation  matrix  A  (see  below). 

After  arguments  similar  to  those  of  Section  7,  we  find  the  enriched 
covariance  forecast  to  be: 


|  ty  - 


(81) 


which  should  be  compared  with  (73).  £  is  now  N  'IJq  ,  and  again  the  en¬ 

riched  prior  makes  the  credibility  factor  matrix  full-dimensional,  but 
changes  the  sufficient  statistics  to  their  expanded  versions,  using  XtX^ 

instead  of  x  x'  . 

~t~t 

~  -“1  r* 

One  can  determine  second  moments  of  £  or  £  •  l  through  trans- 

formations  of  the  formulae  in  the  Appendix. 
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10.  PROPERTIES  OF  THE  TRANSFORMATION  MATRIX 

Having  produced  an  enriched  prior  and  the  associated  prediction 
formulae,  we  now  turn  to  the  problem  of  specifying  the  needed  hyperparameters. 
But  first,  we  must  consider  how  to  find  the  appropriate  transformation 
matrix  A  . 

In  some  situations,  there  may  be  a  natural  choice  for  A  .  For  ex¬ 
ample,  in  their  1981  paper,  Dickey,  Lindley,  and  Press  assume  that  (in  our 

2 

notation)  £  in  (79)  is  of  interclass  form,  £  =  a  [(1  -  p)£  +  ]  ,  and 

note  that  there  exists  an  orthogonal  matrix  £  which  diagonalizes  £  and 
£0  via  rcr1  =  diag  (a, 6,  . . . ,  8)  ,  a  *  a2  +  (p  -  1)B  ,  B  =  a2(l  -  p)  . 

From  (54),  we  see  that  this  essentially  defines  A  -  £  ^  =  £ '  as  a  par¬ 
ticular  orthogonal  matrix,  and  this  shows  how  to  calculate  the  expanded 
observations  £t  in  (81),  and  the  possible  forms  for  N  ,  which  here  becomes 
symmetric.  In  fact,  for  this  interclass  £  ,  there  are  many  possible  A  , 

-h 

as  the  first  column  of  A  is  p  £  ,  and  the  other  columns  are  any  set  of 
(p  -  1)  mutually  orthogonal,  normed  vectors,  also  orthogonal  to  £  ;  this 
gives  y  (p  -  1) (p  -  2)  free  choices. 

In  the  more  general  situation,  we  may  wish  to  test  a  given  £  ,  to  see 
if  it  is  of  permitted  form.  Letting  {a.}  be  the  column  vectors  of  A  , 

~j  ~ 

we  see  that  (54)  can  be  rewritten: 


(82) 


which  means  that: 

(1)  the  hyperparameters  {n^}  c^e  ^dependent  priors  (45) 
are  the  eigenvalues  {v  }  of  N  ,  referred  to  in  Section 
2;  these  must  be  positive  (or,  greater  than  unity)  if  we 
want  the  moments  (48)  to  be  well-defined; 
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(2)  the  columns  of  A  are  the  (right)  eigenvectors  of  jN  , 
which  must  be  mutually  independent,  since  we  have 
assumed  A  is  of  rank  p  ;  however,  they  need  not  be 
mutually  orthogonal,  since  II  is  not  necessarily 
symmetric; 

(3)  from  the  fact  that  (54)  (82)  are  similarity  transforms, 
it  follows  that: 


|N|  =  |NJ  -  n  n  >  0  (or,  >  1)  ; 
i-1 


P  P 

tr  (N)  -  tr  QJq)  -  l  n  -  l  n  >  0  (or,  >  p)  . 
i-1  i-1 


This  interpretation  shows  also  that  we  can  norm  the  column  vectors  of  the 

transformation,  by  using  a  matrix  A*  ,  with  columns  a*  -  is 

equivalent  to  a  scale  change  of  in  the  underlying  y^  ,  used  with  the 

original  A).  The  matrix  of  time  constants,  II  ,  and  the  process  for  forming 

Xt  ,  X  ,  and  M  are  unaffected  by  this  change,  while  the  hyperparameters 

(Xq,Qq)  merely  reflect  the  underlying  scale  changes.  Since  our  prediction 

formulae  are  in  terms  of  the  final  moments  in  X  ,  (m,D^,E^,_C)  ,  this  scale 

change  does  not  affect  (71)  (73)  or  (81),  either.  In  other  words,  the 

2 

use  of  an  arbitrary,  invertible  A  can  only  introduce  p  -  p  effective 

new  hyperparameters  into  the  enriched  credibility  formulae;  the  remaining 

p  column  norms  merely  make  "nuisance"  scale  changes . 

An  important  special  case  occurs  if  A  is  an  orthogonal  matrix,  so  that 

A  1  -  A'  ,  and  all  the  transformations  from  N  ,  ft  ,  and  Qn  to  and  from 
~  ~  ~  ~  ~0 

their  diagonal  counterparts  become  the  same  orthogonal  transformation ;  thi6 
makes  N  symmetric,  and  from  (56)  (57),  N  ,  £  ,  and  ^  all  commute  with 
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each  other.  The  converse  is  usually  only  partially  true,  since  all  "normal" 
real  matrices  (N'N  *  NN')  also  have  orthogonal  reductions.  But,  in  our 
case,  we  require  the  {n^g}  to  real>  and  this  means  N_  must  be  symmetric 
(see  Bellman  (1960)  or  Nobel  (1969)).  Therefore,  ( A  ^  =  A')  iff  (N  =  N') 
for  our  problem. 

The  question  of  whether  or  not  N  is  symmetric  is  tied  up  with  the 
reductions  of  .E  and  I)  ,  which,  being  symmetric,  always  have  orthogonal 
diagonalizations .  If  the  orthogonal  transformation  matrix  is  the  same  for 
both  £  and  I)  ,  then  they  are  said  to  be  simultaneously  orthogonally 
diagonalized  (Bellman  (I960))  which  can  only  occur  if  they  commute  =  DE)  , 

i.e. ,  if  N  -  N’  . 
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11.  SPECIFICATION  OF  HYPERPARAMETERS 

Eventually,  every  Bayesian  analysis  must  address  the  problem  of  specifi¬ 
cation  of  the  prior  hyperparameters.  Several  authors  have  mentioned  the 
difficulties  of  forming  a  prior  opinion  about  the  Wishart  prior  (35) ,  es¬ 
pecially  because  of  its  thinness.  (63)  is  a  little  easier  to  visualize,  as 
it  is  formed  by  a  linear  transformation  from  independent  Gamma  densities  (45), 
yet  here  we  have  the  problem  of  reduced  rank,  and  the  required  inter-relations 
(56)  (57)  between  £3  and  the  hyperparameters. 

The  point  of  view  we  wish  to  emphasize  in  this  section  is  that,  if  the 
objective  of  the  Bayesian  analysis  is  to  forecast  the  mean  and  variance  of 
x  ,  we  can  focus  our  attention  on  the  estimation  of  parameters  which  are 
pretty  much  in  the  space  X  ,  and  which  have  natural  classical  estimators 
when  large  amounts  of  collateral  data  are  available.  (However,  we  will  not 
discuss  the  problem  of  actually  making  these  estimates:  see,  inter  alia, 

De  Vylder  (1978)  (1981),  Norberg  (1980)  (1982),  Zehnwirth  (1981)  and  Sundt 
(1981)).  Counting  the  number  of  hyperparameters  to  be  specified  also  in¬ 
dicates  the  additional  modelling  flexibility  of  an  enriched  prior. 

As  a  warm-up,  let  us  first  examine  the  basic  credibility  model  of 
Section  2,  where  y  is  random,  but  V(x  |  p)  *  W  ^  *  E  is  given.  In  the 
simple  natural  conjugate  prior  (19),  we  see  that  there  are  p  +  1  hyper¬ 
parameters  (nggim)  to  be  specified;  but  m  »  E{jc}  ,  and  nQ0  is  got  from 

n00^  “  ~  ^  ’  where  £  “  ^{y}  must  be  similar  to  £  ,  so  these  parameters 

2 

are  easily  visualized.  The  enriched  version  (30)  has  p  +  p  hyperparameters 
(m,N)  ,  but  these  are  also  easily  visualized,  as  again  m  «  E{x}  »  but  N  is 
got  from  ■  _D  ^  ,  with  J3  ■  ^{y}  now  an  arbitrary,  symmetric  positive 
definite  matrix  to  be  estimated.  In  other  words,  apart  from  m  and  C  ■  U(x) 
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the  only  additional  prior  specification  problem  involves  the  split  of  vari¬ 
ance  into  its  inter-risk  and  intra-risk  components,  one  of  which  is  given. 

2 

The  modelling  gain  in  using  the  enriched  version  is  p  -  1  additional 
parameters . 

1  2 

Passing  now  to  the  case  when  both  ,  fi  are  the  —  (p  +  3p)  random 

1  2 

parameters,  we  recall  that  the  Ando-Kaufman  prior  (33)  requires  y  (p  +  3p  +  2) 

hyperparameters  ^oO’^O’-So^  ‘  mean  anc*  t*le  variance  of  (37)  give 

~  (p2  +  3p)  of  these:  E{jl}  «=  m  =  n0^xQ  ,  and  £(£)  =  D  =  n0Q(n00  +  1)  ^  , 

leaving  only  one  degree  of  freedom,  since  EV{x  |  ft)  =  E  =  nQQ2,  •  Alternately, 

12 

we  could  ignore  I)  ,  fix  (p  +  p)  coefficients  from  the  mean  of  fl  or 
ft  1  -  y  and  then  estimate  one  second  moment  of  ft  or  T  from  the  formulae 
in  the  Appendix;  all  other  second  moments  are  then  determined!  In  short, 
there  is  one  more  hyper parameter  than  the  number  of  parameters,  no  matter 
the  size  of  p  . 

Our  enriched  prior  (55)  can  be  evaluated  at  several  different  stages. 

In  terms  of  y  and  (X,tO  ,  we  see  from  (45)  that,  for  2p  random  param¬ 
eters,  we  have  first  3p  hyperparameters  (n0,yQ,rQ)  ,  augmented  by  the 

2 

effective  number  of  elements  in  A  (p  -  p  ,  by  a  previous  argument) ,  for 
2 

a  total  of  p  +  2p  hyperparameters.  In  terms  of  (55),  we  count  first 
2  1 

the  p  +  p  +  j  p(p  +  1)  hyperparameters  {N,Xq,^}q}  ’  must  then  be 

12  2 
reduced  by  the  j  (p  -  p)  constraints  (56) ,  for  a  total  of  p  +  2p 

effective  hyperparameters.  Finally,  in  the  specification  the  author  prefers, 

we  think  of  the  problem  of  specifying  prior  estimates  of  m  ,  ,E  ,  and  D  . 

We  then  compute  ■  _ED_  ^  ,  _Xg  “  Nm  ,  and  ■  (JN  +  I)E  .  The  eigenvalues 

of  N  are  then  computed,  and  if  these  are  negative  (or  less  than  unity), 

then  the  assumptions  about  12  ,  I)  must  be  inconsistent  with  the  LDMEF  model. 

Usually,  however,  there  will  be  no  difficulty  at  this  point,  and  A  is 
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determined  as  the  matrix  of  eigenvectors  of  .N  ,  (which  will  be  orthogonal 
if  .N  is  symmetric).  The  determinations  of  <X£  (t  =  0,1,  .  . . ,  n)  , 

X  ,  and  M  are  immediate  from  (69)  (72),  whence  one  can  find  Q  from  (64), 

and  the  credibility  forecasts  from  (71)  (73).  (m  ,  jE  ,  and  jD)  give  also 

12  12  2  2 
p  +  —  (p  +  p)  +  —  (p  +  p)  =  p  +  2p  degrees  of  freedom,  p  more  than  the 

number  of  random  parameters.  By  comparison  with  the  Ando-Kaufman  prior,  we 
1  2 

see  that  (p  +  p  -  2)  more  hyperparameters  have  been  introduced. 

Finally,  if  only  the  precision  £2  (or  covariance  l  )  is  random,  (78) 

1  2 

shows  that,  for  the  (p  +  p)  random  parameters,  the  Wishart  prior 

1  2 

requires  specification  of  —  (p  +  p  +  2)  hyperparameters  (rQQ *  i-e., 

jC  and  one  additional  component  of  say,  the  variance  of  £  •  The  enriched 

prior,  on  the  other  hand,  only  contains  p  independent  random  variables, 

2 

but  utilizes  p  +  p  hyperparameters,  computed  in  the  various  ways  in¬ 
dicated  above.  Since  m  -  I)  ■  0  ,  this  suggests  that  the  analyst  could  es- 

-1  12 
timate  C  =  E_  =  £  ^  ,  but  must  obtain  the  remaining  ^  (p  +  p)  parameters 

from  the  third  moments  of  x  ,  the  second  moments  of  £  ,  or  else  some 

physical  interpretation  of  N  or  A  . 
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Example : 

To  illustrate  the  issues  raised  in  the  previous  sections,  we  consider 
a  two-dimensional  example  where  m  ,  and  the  two  components  of  the  covariance 
have  been  estimated: 


1 

1 

P 

3 

3 

en 

e12 

= 

1 

5 

; 

e12 

e22 

3 

6 

For  jE  to  be  positive  definite,  e^  ,  e^2  ,  and  (e^e22  ”  el2)  must  be 
positive.  Then,  we  find  easily  that  D  is  diagonalized  by  ^  ,  with 
eigenvalues  £  ,  where: 


%  -  diag  (6)  ;  ID  -jz 


2  +1 

-1  2 


S  & 


£.i]' 


and  JE  is  diagonalized  by  ,  with  eigenvalues  t_  ,  where: 


r 


Zgk  '  dia*  <X>  5  IE 


kle12 


kl(El  “  ell> 


(en  +  e99)  ±  Ve12  +  4  ^ell“e22^  5  k: 


k2(e2  -  e22) 


k2e12 


e12+(ei_eii)2 


-h 


(i-  1,2) 


1,2  2  '"11  "22' 

This  makes  the  matrix  of  time  constants,  _N  ,  and  the  credibility  matrix,  Z  , 


5ell"2e12 

2e12“  2ell 

1 

n(n+  2e22  -  2e12> 

-2n(e12-  en> 

N- 

_5e12  "  2e22 

2e22_  2e12_ 

•  7  ■  — = — 

’  ~  A(n) 

-n(3ei2-2e22) 

n(n+  5e^  -  2e12> 

A(n)  ■  n  +  nfSe^  +  2e22  -  4e12]  +  6 


elle22  "  e12 J 
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The  eigenvalues  of  N  ,  v  =  [v.,v03'  ,  are  the  two  roots  of  A(-v)  =  0  ;  in 
^  ^  1  z 

order  that  they  both  be  real  and  positive: 


5ell  +  2e22  "  4e12  51  °  ;  ellel2  >  e12  • 


In  this  case,  the  second  condition  dominates,  and  is  automatically  satisfied 
if  .E  >  0  .  From  the  eigenvectors,  we  can  form 


2e12  “  2ell 


V1  +  2el2  "  5ell 


(v2  +  2e12  -  2e22) 


5e12  ”  2e22 


or  any  similar  A  with  normed  columns,  so  that  A_1NA  -  diag  (v)  *=  . 

If  we  use  the  Ando-Kaufman  prior  (33),  we  can  set  m  and  D  as  above, 
but  we  require  that  £  *  noo~  »  w^ich  means  that: 


ell  *  e12  ’  e22  *  2  e12  ’  n00  *  3e12  * 


and  that  e^2  >0  or  1.  From  these  we  get  the  6  hyperparameters  (noo ’-2o ’-2(P 
for  the  prior;  the  3  precision-parameters,  (u  ,ui^2,u22)  ,  are  constrained 
by  the  positive-definiteness  requirement  uilu22  >  '  From  above  we 

see  that  £  -  jy  ei2»3e^2j  ’  anc*  ~E  *  ~D  ^a^ter  possibly  permuting  columns), 

and  “  3ei2  ^  2)  •  If  we  use  the  prior  (55)  with  the  above 

values  of  (e^)  *  then  N  is  already  diagonalized,  and  any  invertible  A 
works,  so  we  might  as  well  use  A  ■  £  .  Note  also  that  the  (single)  con¬ 
straint  (44)  or  (57)  is  vacuous,  in  this  case.  A  typical  term  in  the  vari¬ 
ance  forecasts  (73)  and  (81)  would  use: 


~t~t 


r  2 

— 

r  2 

- 

xit 

XltX2t 

xit 

0 

m 

2 

5  XX  - 

~t~t 

2 

_xltX2t 

X2t 

0 

X2t_ 

4 


40 


showing  the  underlying  independence  in  this  case,  since  _Z  is  diagonal. 

All  other  choices  of  the  (e^)  require  the  more  general  prior  (55), 
An  important  special  case  occurs  when: 


e22  ell  +  2  el2  (ell  *  e12)  ’ 


with  eu  >  \  e12  to  make  E  >  0  .  Then  e  -  [eu  -  |  «12  J  en  +  2e12] 
and  we  again  find  ^  .  This  means  that  N  is  symmetric,  and  we  find 

the  underlying  time  constants  to  be: 


v  =  [6e11  -  3el2  ;  en  +  2e12]'  ;  (v±  =  z±/S±  i  i  -  1,2)  ; 


both  now  greater  than  zero  if  e^  >  -2e12  .  In  fact,  A  can  be  normalized 
so  that  A  =  -  Tp  -  (A  1)  '  !  A  typical  term  in  the  variance  forecast  (81) 
would  be: 


Wi  "  25  Xlt 


5  -6 

-6  8 


+  XltX2t 


-4 

14 

14 

12 

5  7 

7  17 


All  other  cases  give  more  general  results.  For  example,  if 


ell  “  el2  "  Ke12  ’ 


then  K  >  1  to  make  E  >  0  ,  £  -  [  (K  -  l)e12  ;  (K  +  ’  DUt 


1  1  1 

~E  "  /2  -1  1  *  ~D  * 
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N  is  unsymmetric: 


a -  ei2 


5K  -  2  2  -  2K 

5  -  2K  2K  -  2 


and  the  characteristics  roots  must  be  calculated  from: 


fi2  r 

Vl, 2  “  2 


(7K  -  4)  +  \23K  -  56K  +  40 


which  are  obviously  not  related  to  {e^}  and  {6^}  .  For  example,  if  K  =  2 
we  find 

vx  2  ■  (5  ±  ^)ei2  =  (2.35424,7.64575)  x  , 


0.33392 

0.98467" 

_  1 

"-0,20049 

1.13192" 

A  « 

0.94260 

0.17441 

•  A  1  = 

*  & 

1.08355 

-0.38385 

/  A'  : 


and  A  ^NA  diagonalizes  to  diag  (v,,v_)  .  A  typical  term  in  the  variance 
forecast  (81)  would  be: 


1.14286  0.21429 


X  Xf  =  x  +  x  x 

~t~t  It  0>2i429  0.071429  lc  2t  -0.28571 


0.85714  -0.28571 

0.28571  -0.42857 


0.28571 

0.42857 


0.42857" 

1.14286. 
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SUMMARY 

To  summarize,  the  linearly-dependent  multinormal  prior  introduced  in 
Section  6  has  two  important  advantages  over  the  traditional  Normal-Wishart 
prior:  it  permits  the  specification  of  a  much  larger  number  of  hyperparam¬ 
eters,  which  can  conveniently  be  taken  as  the  prior  observational  mean,  m  , 
and  the  two  components  of  prior  observational  covariance,  jD  and  £  ;  fur¬ 
thermore,  the  prediction  formula  for  the  mean  observation  is  of  full-dimen¬ 
sional  credibility  form.  Against  this  must  be  balanced  the  fact  that  the 
random  precision  is  of  reduced  rank,  leading  to  a  prediction  formula  for 
the  observational  covariance  that  involves  a  new  type  of  sufficient  statis¬ 
tic.  It  will  be  interesting  to  see  if  this  additional  modelling  flexibility 
leads  to  improved  predictions,  or  if  further  developments  of  this  difficult 
theory  are  possible. 
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APPENDIX 

NOTATIONS,  DISTRIBUTIONS,  AND  MOMENTS 


Boldface  lower  case  letters  refer  to  vectors,  usually  p-dimensional, 
viz.,  x  *  [x^.x^,  ...,  Xp]'  >  boldface  upper  case  letters  refer  to  p  *  p 
matrices,  whose  elements  are  written  in  lower  case,  viz.,  £  has  elements 
[uKj]  .  In  contrast  to  Jewell  (1974b),  greek  letters  here  refers  to  param¬ 
eters.  Random  variables  are  indicated  by  a  tilde  over  the  corresponding 
argument;  in  this  way  we  can  use  the  usual  trick  of  using  p(*)  for  all 
random  variables,  and  letting  the  arguments  indicate  the  appropriate  (dis¬ 
crete  or  continuous)  density,  so  that  p(x  |  y,ft)  is  the  density  of  3t  , 
conditioned  on  y  =  y  and  fi  -  Q  .  E{x}  is  the  usual  vector  mean,  and 
l/{x)  =  E{[x  -  E{x}][x  -  E{x}]'}  is  the  variance-covariance  (dispersion) 
matrix.  Sequential  operators  are  interpreted  inside-out,  viz.  E£{y  |  £) 
means:  first  the  dispersion  of  £  ,  conditional  upon  £  m  £  5  then  the 
expectation  over  all  values  of  £  . 

Our  definition  of  distributions  is  taken  mostly  from  Johnson  and  Kotz 
(1972)  with  the  exception  that  we  always  emphasize  the  precision  (parameter) 
matrix,  instead  of  the  dispersion. 

The  multivariate-normal  ( multinormal )  distribution  with  mean  vector  m 
and  precision-matrix  W  has  x  ~  N^(m;W)  if 

p(x)  -  (2tt)_p^2  | W 1^  exp  j-  ^  (x  -  m) 'W(x  -  m)  {  ,  (x  t  RP)  . 


Moments  are: 


E{x)  -  m  ;  (/{x}  -  W 
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The  multivariate  Student-t  ("with  common  denominator")  distribution  with 
a  degrees  of  freedom,  mean  vector  ra  ,  and  (symmetric)  precision-parameter 
matrix  W  has  x  (a;m;W)  if: 

/v  P  rwrv 


P(x) 


r(|  (a  +  P)) 
Ora)p/2r(f) 


jwj^tl  +  a_1(x  -  m)'W(x  -  m)]-!s(a+p)  , 


(x  E  Rp)  . 


Moments  are: 


E{x}  =  m 


The  Wishart  distribution  with  a  degrees  of  freedom  and  a  (symmetric) 
precision-parameter  matrix  W  is  defined  only  for  a  random,  symmetric  matrix 
over  values  in  R^P^P*1)  that  make  it  positive  definite.  ~  W^(a,W) 
if  the  density  is: 


p(ft) 


2>»Pr 


(H 


-1 


^^(a-p-D 


exp 


- i tr 


(nw) 


(n  >  0) 


Letting 


(the  usual  variance- covariance  matrix),  we  have 


E{jj)  -  aW"1  ;  E[jj  -  (a  -  p  -  1)_1W  ; 
and  the  covariances  are:  (see  Press  (1981)) 


C*a'ij;wkl^  a'w  w  +  w  wJ  )  ; 

C*°ij;0kl^  “  Ka~  P)(a_  P"  l)(a-  P~  3)]  1|2(o-p-l)  lwijwkl  +  wikwj1  +  wilwkj j 
for  all  (i,j,k,l)  ,  where  w^  -  (W  1).  . 


L 
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The  Gamma  distribution  with  shape  parameter  y  and  scale  parameter  6 
has  u  ~  G(y,6)  if: 


p(oj) 


B(B,)rle~Bu 

r(y) 


(u  >  0)  . 


Moments  are: 


E{cj}  =  y@  1  ;  E{<jj  1}  *  (y  -  l)  1e  ; 

-  y6-2  ;  =  (y  -  1)“2(y  -  2)g2  . 

It  follows  that  (^(a.w^)  “  ®("|’a,"i’wil)  is  a  Chi-squared  density. 


sot  froa  «  »  jT y1  ,  with  |>  »  V{jj}  now  an  arbitrary,  ayatrlc  pooitiwo 
daflnlta  aatrlx  to  bo  ostlaatod.  Zn  otbor  words,  apart  tram  u  and  £  ■  £(i) 


%  - _ _ 


_  _ _  .aw^ulc  £  -  tu_  ,  •  jlc  ,  and  JJq  “  (N  +  l)i  .  The  eigenvalues 

of  JN  are  tnen  computed,  and  If  these  are  negative  (or  less  than  unity), 
then  the  assumptions  about  £  ,  I)  must  be  inconsistent  with  the  LDMEF  model. 
Usually,  however,  there  will  be  no  difficulty  at  this  point,  and  A  is 


